A generic function used to describe an object for use by LLM.
See also
Other btw_this() methods:
btw_this.character(),
btw_this.data.frame(),
btw_this.environment()
Examples
btw_this(mtcars)
#> [1] "```json"
#> [2] "{\"n_cols\":11,\"n_rows\":32,\"groups\":[],\"class\":\"data.frame\",\"columns\":{\"mpg\":{\"variable\":\"mpg\",\"type\":\"numeric\",\"mean\":20.0906,\"sd\":6.0269,\"p0\":10.4,\"p25\":15.425,\"p50\":19.2,\"p75\":22.8,\"p100\":33.9},\"cyl\":{\"variable\":\"cyl\",\"type\":\"numeric\",\"mean\":6.1875,\"sd\":1.7859,\"p0\":4,\"p25\":4,\"p50\":6,\"p75\":8,\"p100\":8},\"disp\":{\"variable\":\"disp\",\"type\":\"numeric\",\"mean\":230.7219,\"sd\":123.9387,\"p0\":71.1,\"p25\":120.825,\"p50\":196.3,\"p75\":326,\"p100\":472},\"hp\":{\"variable\":\"hp\",\"type\":\"numeric\",\"mean\":146.6875,\"sd\":68.5629,\"p0\":52,\"p25\":96.5,\"p50\":123,\"p75\":180,\"p100\":335},\"drat\":{\"variable\":\"drat\",\"type\":\"numeric\",\"mean\":3.5966,\"sd\":0.5347,\"p0\":2.76,\"p25\":3.08,\"p50\":3.695,\"p75\":3.92,\"p100\":4.93},\"wt\":{\"variable\":\"wt\",\"type\":\"numeric\",\"mean\":3.2172,\"sd\":0.9785,\"p0\":1.513,\"p25\":2.5812,\"p50\":3.325,\"p75\":3.61,\"p100\":5.424},\"qsec\":{\"variable\":\"qsec\",\"type\":\"numeric\",\"mean\":17.8487,\"sd\":1.7869,\"p0\":14.5,\"p25\":16.8925,\"p50\":17.71,\"p75\":18.9,\"p100\":22.9},\"vs\":{\"variable\":\"vs\",\"type\":\"numeric\",\"mean\":0.4375,\"sd\":0.504,\"p0\":0,\"p25\":0,\"p50\":0,\"p75\":1,\"p100\":1},\"am\":{\"variable\":\"am\",\"type\":\"numeric\",\"mean\":0.4062,\"sd\":0.499,\"p0\":0,\"p25\":0,\"p50\":0,\"p75\":1,\"p100\":1},\"gear\":{\"variable\":\"gear\",\"type\":\"numeric\",\"mean\":3.6875,\"sd\":0.7378,\"p0\":3,\"p25\":3,\"p50\":4,\"p75\":4,\"p100\":5},\"carb\":{\"variable\":\"carb\",\"type\":\"numeric\",\"mean\":2.8125,\"sd\":1.6152,\"p0\":1,\"p25\":2,\"p50\":2,\"p75\":4,\"p100\":8}}}"
#> [3] "```"
btw_this(dplyr::mutate)
#> [1] "```r" "function (.data, ...) "
#> [3] "{" " UseMethod(\"mutate\")"
#> [5] "}" "```"
btw_this("{dplyr}")
#> [1] "```json"
#> [2] "[\n {\"topic_id\":\"across\",\"title\":\"Apply a function (or functions) across multiple columns\",\"aliases\":[\"across\",\"if_any\",\"if_all\"]},\n {\"topic_id\":\"add_rownames\",\"title\":\"Convert row names to an explicit variable.\",\"aliases\":[\"add_rownames\"]},\n {\"topic_id\":\"all_equal\",\"title\":\"Flexible equality comparison for data frames\",\"aliases\":[\"all_equal\"]},\n {\"topic_id\":\"all_vars\",\"title\":\"Apply predicate to all variables\",\"aliases\":[\"all_vars\",\"any_vars\"]},\n {\"topic_id\":\"args_by\",\"title\":\"Helper for consistent documentation of '.by'\",\"aliases\":[\"args_by\"]},\n {\"topic_id\":\"arrange\",\"title\":\"Order rows using column values\",\"aliases\":[\"arrange\",\"arrange.data.frame\"]},\n {\"topic_id\":\"arrange_all\",\"title\":\"Arrange rows by a selection of variables\",\"aliases\":[\"arrange_all\",\"arrange_at\",\"arrange_if\"]},\n {\"topic_id\":\"auto_copy\",\"title\":\"Copy tables to same source, if necessary\",\"aliases\":[\"auto_copy\"]},\n {\"topic_id\":\"backend_dbplyr\",\"title\":\"Database and SQL generics.\",\"aliases\":[\"backend_dbplyr\",\"db_desc\",\"sql_translate_env\",\"db_list_tables\",\"db_has_table\",\"db_data_type\",\"db_save_query\",\"db_begin\",\"db_commit\",\"db_rollback\",\"db_write_table\",\"db_create_table\",\"db_insert_into\",\"db_create_indexes\",\"db_create_index\",\"db_drop_table\",\"db_analyze\",\"db_explain\",\"db_query_fields\",\"db_query_rows\",\"sql_select\",\"sql_subquery\",\"sql_join\",\"sql_semi_join\",\"sql_set_op\",\"sql_escape_string\",\"sql_escape_ident\"]},\n {\"topic_id\":\"band_members\",\"title\":\"Band membership\",\"aliases\":[\"band_members\",\"band_instruments\",\"band_instruments2\"]},\n {\"topic_id\":\"base\",\"title\":\"dplyr <-> base R\",\"aliases\":[\"base\"]},\n {\"topic_id\":\"between\",\"title\":\"Detect where values fall in a specified range\",\"aliases\":[\"between\"]},\n {\"topic_id\":\"bind_cols\",\"title\":\"Bind multiple data frames by column\",\"aliases\":[\"bind_cols\"]},\n {\"topic_id\":\"bind_rows\",\"title\":\"Bind multiple data frames by row\",\"aliases\":[\"bind_rows\",\"bind\"]},\n {\"topic_id\":\"c_across\",\"title\":\"Combine values from multiple columns\",\"aliases\":[\"c_across\"]},\n {\"topic_id\":\"case_match\",\"title\":\"A general vectorised 'switch()'\",\"aliases\":[\"case_match\"]},\n {\"topic_id\":\"case_when\",\"title\":\"A general vectorised if-else\",\"aliases\":[\"case_when\"]},\n {\"topic_id\":\"check_dbplyr\",\"title\":\"dbplyr compatibility functions\",\"aliases\":[\"check_dbplyr\",\"wrap_dbplyr_obj\"]},\n {\"topic_id\":\"coalesce\",\"title\":\"Find the first non-missing element\",\"aliases\":[\"coalesce\"]},\n {\"topic_id\":\"colwise\",\"title\":\"Column-wise operations\",\"aliases\":[\"colwise\"]},\n {\"topic_id\":\"combine\",\"title\":\"Combine vectors\",\"aliases\":[\"combine\"]},\n {\"topic_id\":\"common_by\",\"title\":\"Extract out common by variables\",\"aliases\":[\"common_by\"]},\n {\"topic_id\":\"compute\",\"title\":\"Force computation of a database query\",\"aliases\":[\"compute\",\"collect\",\"collapse\"]},\n {\"topic_id\":\"consecutive_id\",\"title\":\"Generate a unique identifier for consecutive combinations\",\"aliases\":[\"consecutive_id\"]},\n {\"topic_id\":\"context\",\"title\":\"Information about the \\\"current\\\" group or variable\",\"aliases\":[\"context\",\"n\",\"cur_group\",\"cur_group_id\",\"cur_group_rows\",\"cur_column\"]},\n {\"topic_id\":\"copy_to\",\"title\":\"Copy a local data frame to a remote src\",\"aliases\":[\"copy_to\"]},\n {\"topic_id\":\"count\",\"title\":\"Count the observations in each group\",\"aliases\":[\"count\",\"count.data.frame\",\"tally\",\"add_count\",\"add_tally\"]},\n {\"topic_id\":\"cross_join\",\"title\":\"Cross join\",\"aliases\":[\"cross_join\"]},\n {\"topic_id\":\"cumall\",\"title\":\"Cumulativate versions of any, all, and mean\",\"aliases\":[\"cumall\",\"cumany\",\"cummean\"]},\n {\"topic_id\":\"defunct\",\"title\":\"Defunct functions\",\"aliases\":[\"defunct\",\"id\",\"failwith\",\"select_vars\",\"rename_vars\",\"select_var\",\"current_vars\",\"bench_tbls\",\"compare_tbls\",\"compare_tbls2\",\"eval_tbls\",\"eval_tbls2\",\"location\",\"changes\"]},\n {\"topic_id\":\"deprec-context\",\"title\":\"Information about the \\\"current\\\" group or variable\",\"aliases\":[\"deprec-context\",\"cur_data\",\"cur_data_all\"]},\n {\"topic_id\":\"desc\",\"title\":\"Descending order\",\"aliases\":[\"desc\"]},\n {\"topic_id\":\"dim_desc\",\"title\":\"Describing dimensions\",\"aliases\":[\"dim_desc\"]},\n {\"topic_id\":\"distinct\",\"title\":\"Keep distinct/unique rows\",\"aliases\":[\"distinct\"]},\n {\"topic_id\":\"distinct_all\",\"title\":\"Select distinct rows by a selection of variables\",\"aliases\":[\"distinct_all\",\"distinct_at\",\"distinct_if\"]},\n {\"topic_id\":\"distinct_prepare\",\"title\":\"Same basic philosophy as group_by_prepare(): lazy_dots comes in, list of data and vars (character vector) comes out.\",\"aliases\":[\"distinct_prepare\",\"group_by_prepare\"]},\n {\"topic_id\":\"do\",\"title\":\"Do anything\",\"aliases\":[\"do\"]},\n {\"topic_id\":\"dplyr\",\"title\":\"Introduction to dplyr\",\"aliases\":[\"dplyr\"]},\n {\"topic_id\":\"dplyr-locale\",\"title\":\"Locale used by 'arrange()'\",\"aliases\":[\"dplyr-locale\"]},\n {\"topic_id\":\"dplyr-package\",\"title\":\"dplyr: A Grammar of Data Manipulation\",\"aliases\":[\"dplyr\",\"dplyr-package\"]},\n {\"topic_id\":\"dplyr_by\",\"title\":\"Per-operation grouping with '.by'/'by'\",\"aliases\":[\"dplyr_by\"]},\n {\"topic_id\":\"dplyr_data_masking\",\"title\":\"Data-masking\",\"aliases\":[\"dplyr_data_masking\"]},\n {\"topic_id\":\"dplyr_extending\",\"title\":\"Extending dplyr with new data frame subclasses\",\"aliases\":[\"dplyr_extending\",\"dplyr_row_slice\",\"dplyr_col_modify\",\"dplyr_reconstruct\"]},\n {\"topic_id\":\"dplyr_tidy_select\",\"title\":\"Argument type: tidy-select\",\"aliases\":[\"dplyr_tidy_select\"]},\n {\"topic_id\":\"explain\",\"title\":\"Explain details of a tbl\",\"aliases\":[\"explain\",\"show_query\"]},\n {\"topic_id\":\"filter\",\"title\":\"Keep rows that match a condition\",\"aliases\":[\"filter\"]},\n {\"topic_id\":\"filter-joins\",\"title\":\"Filtering joins\",\"aliases\":[\"filter-joins\",\"semi_join\",\"semi_join.data.frame\",\"anti_join\",\"anti_join.data.frame\"]},\n {\"topic_id\":\"filter_all\",\"title\":\"Filter within a selection of variables\",\"aliases\":[\"filter_all\",\"filter_if\",\"filter_at\"]},\n {\"topic_id\":\"funs\",\"title\":\"Create a list of function calls\",\"aliases\":[\"funs\"]},\n {\"topic_id\":\"glimpse\",\"title\":\"Get a glimpse of your data\",\"aliases\":[\"glimpse\"]},\n {\"topic_id\":\"group_by\",\"title\":\"Group by one or more variables\",\"aliases\":[\"group_by\",\"ungroup\"]},\n {\"topic_id\":\"group_by_all\",\"title\":\"Group by a selection of variables\",\"aliases\":[\"group_by_all\",\"group_by_at\",\"group_by_if\"]},\n {\"topic_id\":\"group_by_drop_default\",\"title\":\"Default value for .drop argument of group_by\",\"aliases\":[\"group_by_drop_default\"]},\n {\"topic_id\":\"group_cols\",\"title\":\"Select grouping variables\",\"aliases\":[\"group_cols\"]},\n {\"topic_id\":\"group_data\",\"title\":\"Grouping metadata\",\"aliases\":[\"group_data\",\"group_keys\",\"group_rows\",\"group_indices\",\"group_vars\",\"groups\",\"group_size\",\"n_groups\"]},\n {\"topic_id\":\"group_map\",\"title\":\"Apply a function to each group\",\"aliases\":[\"group_map\",\"group_modify\",\"group_walk\"]},\n {\"topic_id\":\"group_nest\",\"title\":\"Nest a tibble using a grouping specification\",\"aliases\":[\"group_nest\"]},\n {\"topic_id\":\"group_split\",\"title\":\"Split data frame by groups\",\"aliases\":[\"group_split\"]},\n {\"topic_id\":\"group_trim\",\"title\":\"Trim grouping structure\",\"aliases\":[\"group_trim\"]},\n {\"topic_id\":\"grouped_df\",\"title\":\"A grouped data frame.\",\"aliases\":[\"grouped_df\",\"is.grouped_df\",\"is_grouped_df\"]},\n {\"topic_id\":\"grouping\",\"title\":\"Grouped data\",\"aliases\":[\"grouping\"]},\n {\"topic_id\":\"ident\",\"title\":\"Flag a character vector as SQL identifiers\",\"aliases\":[\"ident\"]},\n {\"topic_id\":\"if_else\",\"title\":\"Vectorised if-else\",\"aliases\":[\"if_else\"]},\n {\"topic_id\":\"in-packages\",\"title\":\"Using dplyr in packages\",\"aliases\":[\"in-packages\"]},\n {\"topic_id\":\"join_by\",\"title\":\"Join specifications\",\"aliases\":[\"join_by\",\"closest\",\"overlaps\",\"within\"]},\n {\"topic_id\":\"last_dplyr_warnings\",\"title\":\"Show warnings from the last command\",\"aliases\":[\"last_dplyr_warnings\"]},\n {\"topic_id\":\"lead-lag\",\"title\":\"Compute lagged or leading values\",\"aliases\":[\"lead-lag\",\"lag\",\"lead\"]},\n {\"topic_id\":\"make_tbl\",\"title\":\"Create a \\\"tbl\\\" object\",\"aliases\":[\"make_tbl\"]},\n {\"topic_id\":\"mutate\",\"title\":\"Create, modify, and delete columns\",\"aliases\":[\"mutate\",\"mutate.data.frame\"]},\n {\"topic_id\":\"mutate-joins\",\"title\":\"Mutating joins\",\"aliases\":[\"mutate-joins\",\"join\",\"join.data.frame\",\"inner_join\",\"inner_join.data.frame\",\"left_join\",\"left_join.data.frame\",\"right_join\",\"right_join.data.frame\",\"full_join\",\"full_join.data.frame\"]},\n {\"topic_id\":\"mutate_all\",\"title\":\"Mutate multiple columns\",\"aliases\":[\"mutate_all\",\"mutate_if\",\"mutate_at\",\"transmute_all\",\"transmute_if\",\"transmute_at\"]},\n {\"topic_id\":\"n_distinct\",\"title\":\"Count unique combinations\",\"aliases\":[\"n_distinct\"]},\n {\"topic_id\":\"na_if\",\"title\":\"Convert values to 'NA'\",\"aliases\":[\"na_if\"]},\n {\"topic_id\":\"near\",\"title\":\"Compare two numeric vectors\",\"aliases\":[\"near\"]},\n {\"topic_id\":\"nest_by\",\"title\":\"Nest by one or more variables\",\"aliases\":[\"nest_by\"]},\n {\"topic_id\":\"nest_join\",\"title\":\"Nest join\",\"aliases\":[\"nest_join\",\"nest_join.data.frame\"]},\n {\"topic_id\":\"new_grouped_df\",\"title\":\"Low-level construction and validation for the grouped_df and rowwise_df classes\",\"aliases\":[\"new_grouped_df\",\"validate_grouped_df\",\"new_rowwise_df\",\"validate_rowwise_df\"]},\n {\"topic_id\":\"nth\",\"title\":\"Extract the first, last, or nth value from a vector\",\"aliases\":[\"nth\",\"first\",\"last\"]},\n {\"topic_id\":\"ntile\",\"title\":\"Bucket a numeric vector into 'n' groups\",\"aliases\":[\"ntile\"]},\n {\"topic_id\":\"order_by\",\"title\":\"A helper function for ordering window function output\",\"aliases\":[\"order_by\"]},\n {\"topic_id\":\"percent_rank\",\"title\":\"Proportional ranking functions\",\"aliases\":[\"percent_rank\",\"cume_dist\"]},\n {\"topic_id\":\"pick\",\"title\":\"Select a subset of columns\",\"aliases\":[\"pick\"]},\n {\"topic_id\":\"programming\",\"title\":\"Programming with dplyr\",\"aliases\":[\"programming\"]},\n {\"topic_id\":\"progress_estimated\",\"title\":\"Progress bar with estimated time.\",\"aliases\":[\"progress_estimated\"]},\n {\"topic_id\":\"pull\",\"title\":\"Extract a single column\",\"aliases\":[\"pull\"]},\n {\"topic_id\":\"recode\",\"title\":\"Recode values\",\"aliases\":[\"recode\",\"recode_factor\"]},\n {\"topic_id\":\"reexports\",\"title\":\"Objects exported from other packages\",\"aliases\":[\"reexports\",\"%>%\",\"type_sum\",\"data_frame\",\"as_data_frame\",\"lst\",\"add_row\",\"tribble\",\"tibble\",\"as_tibble\",\"view\",\"contains\",\"select_helpers\",\"ends_with\",\"everything\",\"matches\",\"num_range\",\"one_of\",\"starts_with\",\"last_col\",\"any_of\",\"all_of\",\"where\"]},\n {\"topic_id\":\"reframe\",\"title\":\"Transform each group to an arbitrary number of rows\",\"aliases\":[\"reframe\"]},\n {\"topic_id\":\"relocate\",\"title\":\"Change column order\",\"aliases\":[\"relocate\"]},\n {\"topic_id\":\"rename\",\"title\":\"Rename columns\",\"aliases\":[\"rename\",\"rename_with\"]},\n {\"topic_id\":\"row_number\",\"title\":\"Integer ranking functions\",\"aliases\":[\"row_number\",\"min_rank\",\"dense_rank\"]},\n {\"topic_id\":\"rows\",\"title\":\"Manipulate individual rows\",\"aliases\":[\"rows\",\"rows_insert\",\"rows_append\",\"rows_update\",\"rows_patch\",\"rows_upsert\",\"rows_delete\"]},\n {\"topic_id\":\"rowwise\",\"title\":\"Group input by rows\",\"aliases\":[\"rowwise\",\"rowwise\"]},\n {\"topic_id\":\"same_src\",\"title\":\"Figure out if two sources are the same (or two tbl have the same source)\",\"aliases\":[\"same_src\"]},\n {\"topic_id\":\"sample_n\",\"title\":\"Sample n rows from a table\",\"aliases\":[\"sample_n\",\"sample_frac\"]},\n {\"topic_id\":\"scoped\",\"title\":\"Operate on a selection of variables\",\"aliases\":[\"scoped\"]},\n {\"topic_id\":\"se-deprecated\",\"title\":\"Deprecated SE versions of main verbs.\",\"aliases\":[\"se-deprecated\",\"add_count_\",\"add_tally_\",\"arrange_\",\"count_\",\"distinct_\",\"do_\",\"filter_\",\"funs_\",\"group_by_\",\"group_indices_\",\"mutate_\",\"tally_\",\"transmute_\",\"rename_\",\"rename_vars_\",\"select_\",\"select_vars_\",\"slice_\",\"summarise_\",\"summarize_\"]},\n {\"topic_id\":\"select\",\"title\":\"Keep or drop columns using their names and types\",\"aliases\":[\"select\"]},\n {\"topic_id\":\"select_all\",\"title\":\"Select and rename a selection of variables\",\"aliases\":[\"select_all\",\"rename_all\",\"select_if\",\"rename_if\",\"select_at\",\"rename_at\"]},\n {\"topic_id\":\"setops\",\"title\":\"Set operations\",\"aliases\":[\"setops\",\"intersect\",\"union\",\"union_all\",\"setdiff\",\"setequal\",\"symdiff\"]},\n {\"topic_id\":\"slice\",\"title\":\"Subset rows using their positions\",\"aliases\":[\"slice\",\"slice_head\",\"slice_tail\",\"slice_min\",\"slice_max\",\"slice_sample\"]},\n {\"topic_id\":\"sql\",\"title\":\"SQL escaping.\",\"aliases\":[\"sql\"]},\n {\"topic_id\":\"src\",\"title\":\"Create a \\\"src\\\" object\",\"aliases\":[\"src\",\"is.src\"]},\n {\"topic_id\":\"src_dbi\",\"title\":\"Source for database backends\",\"aliases\":[\"src_dbi\",\"src_mysql\",\"src_postgres\",\"src_sqlite\"]},\n {\"topic_id\":\"src_local\",\"title\":\"A local source\",\"aliases\":[\"src_local\",\"src_df\"]},\n {\"topic_id\":\"src_tbls\",\"title\":\"List all tbls provided by a source.\",\"aliases\":[\"src_tbls\"]},\n {\"topic_id\":\"starwars\",\"title\":\"Starwars characters\",\"aliases\":[\"starwars\"]},\n {\"topic_id\":\"storms\",\"title\":\"Storm tracks data\",\"aliases\":[\"storms\"]},\n {\"topic_id\":\"summarise\",\"title\":\"Summarise each group down to one row\",\"aliases\":[\"summarise\",\"summarize\"]},\n {\"topic_id\":\"summarise_all\",\"title\":\"Summarise multiple columns\",\"aliases\":[\"summarise_all\",\"summarise_if\",\"summarise_at\",\"summarize_all\",\"summarize_if\",\"summarize_at\"]},\n {\"topic_id\":\"summarise_each\",\"title\":\"Summarise and mutate multiple columns.\",\"aliases\":[\"summarise_each\",\"summarise_each_\",\"mutate_each\",\"mutate_each_\",\"summarize_each\",\"summarize_each_\"]},\n {\"topic_id\":\"tbl\",\"title\":\"Create a table from a data source\",\"aliases\":[\"tbl\",\"is.tbl\"]},\n {\"topic_id\":\"tbl_df\",\"title\":\"Coerce to a tibble\",\"aliases\":[\"tbl_df\",\"as.tbl\"]},\n {\"topic_id\":\"tbl_ptype\",\"title\":\"Return a prototype of a tbl\",\"aliases\":[\"tbl_ptype\"]},\n {\"topic_id\":\"tbl_vars\",\"title\":\"List variables provided by a tbl.\",\"aliases\":[\"tbl_vars\",\"tbl_nongroup_vars\"]},\n {\"topic_id\":\"tidyeval-compat\",\"title\":\"Other tidy eval tools\",\"aliases\":[\"tidyeval-compat\",\".data\",\"expr\",\"enquo\",\"enquos\",\"sym\",\"syms\",\"as_label\",\"quo\",\"quos\",\"quo_name\",\"ensym\",\"ensyms\",\"enexpr\",\"enexprs\"]},\n {\"topic_id\":\"top_n\",\"title\":\"Select top (or bottom) n rows (by value)\",\"aliases\":[\"top_n\",\"top_frac\"]},\n {\"topic_id\":\"transmute\",\"title\":\"Create, modify, and delete columns\",\"aliases\":[\"transmute\"]},\n {\"topic_id\":\"two-table\",\"title\":\"Two-table verbs\",\"aliases\":[\"two-table\"]},\n {\"topic_id\":\"vars\",\"title\":\"Select variables\",\"aliases\":[\"vars\"]},\n {\"topic_id\":\"window-functions\",\"title\":\"Window functions\",\"aliases\":[\"window-functions\"]},\n {\"topic_id\":\"with_groups\",\"title\":\"Perform an operation with temporary groups\",\"aliases\":[\"with_groups\"]},\n {\"topic_id\":\"with_order\",\"title\":\"Run a function with one order, translating result back to original order\",\"aliases\":[\"with_order\"]}\n]"
#> [3] "```"
#> [4] ""
#> [5] "# Introduction to dplyr {#introduction-to-dplyr .title .toc-ignore}"
#> [6] ""
#> [7] "When working with data you must:"
#> [8] ""
#> [9] "- Figure out what you want to do."
#> [10] ""
#> [11] "- Describe those tasks in the form of a computer program."
#> [12] ""
#> [13] "- Execute the program."
#> [14] ""
#> [15] "The dplyr package makes these steps fast and easy:"
#> [16] ""
#> [17] "- By constraining your options, it helps you think about your data"
#> [18] " manipulation challenges."
#> [19] ""
#> [20] "- It provides simple \"verbs\", functions that correspond to the most"
#> [21] " common data manipulation tasks, to help you translate your thoughts"
#> [22] " into code."
#> [23] ""
#> [24] "- It uses efficient backends, so you spend less time waiting for the"
#> [25] " computer."
#> [26] ""
#> [27] "This document introduces you to dplyr's basic set of tools, and shows"
#> [28] "you how to apply them to data frames. dplyr also supports databases via"
#> [29] "the dbplyr package, once you've installed, read `vignette(\"dbplyr\")` to"
#> [30] "learn more."
#> [31] ""
#> [32] "::: {#data-starwars .section .level2}"
#> [33] "## Data: starwars"
#> [34] ""
#> [35] "To explore the basic data manipulation verbs of dplyr, we'll use the"
#> [36] "dataset `starwars`. This dataset contains 87 characters and comes from"
#> [37] "the [Star Wars API](https://swapi.dev), and is documented in `?starwars`"
#> [38] ""
#> [39] "::: {#cb1 .sourceCode}"
#> [40] "``` {.sourceCode .r}"
#> [41] "dim(starwars)"
#> [42] "#> [1] 87 14"
#> [43] "starwars"
#> [44] "#> # A tibble: 87 × 14"
#> [45] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [46] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [47] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [48] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [49] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [50] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [51] "#> # ℹ 83 more rows"
#> [52] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [53] "#> # vehicles <list>, starships <list>"
#> [54] "```"
#> [55] ":::"
#> [56] ""
#> [57] "Note that `starwars` is a tibble, a modern reimagining of the data"
#> [58] "frame. It's particularly useful for large datasets because it only"
#> [59] "prints the first few rows. You can learn more about tibbles at"
#> [60] "<https://tibble.tidyverse.org>; in particular you can convert data"
#> [61] "frames to tibbles with `as_tibble()`."
#> [62] ":::"
#> [63] ""
#> [64] "::: {#single-table-verbs .section .level2}"
#> [65] "## Single table verbs"
#> [66] ""
#> [67] "dplyr aims to provide a function for each basic verb of data"
#> [68] "manipulation. These verbs can be organised into three categories based"
#> [69] "on the component of the dataset that they work with:"
#> [70] ""
#> [71] "- Rows:"
#> [72] " - `filter()` chooses rows based on column values."
#> [73] " - `slice()` chooses rows based on location."
#> [74] " - `arrange()` changes the order of the rows."
#> [75] "- Columns:"
#> [76] " - `select()` changes whether or not a column is included."
#> [77] " - `rename()` changes the name of columns."
#> [78] " - `mutate()` changes the values of columns and creates new"
#> [79] " columns."
#> [80] " - `relocate()` changes the order of the columns."
#> [81] "- Groups of rows:"
#> [82] " - `summarise()` collapses a group into a single row."
#> [83] ""
#> [84] "::: {#the-pipe .section .level3}"
#> [85] "### The pipe"
#> [86] ""
#> [87] "All of the dplyr functions take a data frame (or tibble) as the first"
#> [88] "argument. Rather than forcing the user to either save intermediate"
#> [89] "objects or nest functions, dplyr provides the `%>%` operator from"
#> [90] "magrittr. `x %>% f(y)` turns into `f(x, y)` so the result from one step"
#> [91] "is then \"piped\" into the next step. You can use the pipe to rewrite"
#> [92] "multiple operations that you can read left-to-right, top-to-bottom"
#> [93] "(reading the pipe operator as \"then\")."
#> [94] ":::"
#> [95] ""
#> [96] "::: {#filter-rows-with-filter .section .level3}"
#> [97] "### Filter rows with `filter()`"
#> [98] ""
#> [99] "`filter()` allows you to select a subset of rows in a data frame. Like"
#> [100] "all single verbs, the first argument is the tibble (or data frame). The"
#> [101] "second and subsequent arguments refer to variables within that data"
#> [102] "frame, selecting rows where the expression is `TRUE`."
#> [103] ""
#> [104] "For example, we can select all character with light skin color and brown"
#> [105] "eyes with:"
#> [106] ""
#> [107] "::: {#cb2 .sourceCode}"
#> [108] "``` {.sourceCode .r}"
#> [109] "starwars %>% filter(skin_color == \"light\", eye_color == \"brown\")"
#> [110] "#> # A tibble: 7 × 14"
#> [111] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [112] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [113] "#> 1 Leia Org… 150 49 brown light brown 19 fema… femin…"
#> [114] "#> 2 Biggs Da… 183 84 black light brown 24 male mascu…"
#> [115] "#> 3 Padmé Am… 185 45 brown light brown 46 fema… femin…"
#> [116] "#> 4 Cordé 157 NA brown light brown NA <NA> <NA> "
#> [117] "#> # ℹ 3 more rows"
#> [118] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [119] "#> # vehicles <list>, starships <list>"
#> [120] "```"
#> [121] ":::"
#> [122] ""
#> [123] "This is roughly equivalent to this base R code:"
#> [124] ""
#> [125] "::: {#cb3 .sourceCode}"
#> [126] "``` {.sourceCode .r}"
#> [127] "starwars[starwars$skin_color == \"light\" & starwars$eye_color == \"brown\", ]"
#> [128] "```"
#> [129] ":::"
#> [130] ":::"
#> [131] ""
#> [132] "::: {#arrange-rows-with-arrange .section .level3}"
#> [133] "### Arrange rows with `arrange()`"
#> [134] ""
#> [135] "`arrange()` works similarly to `filter()` except that instead of"
#> [136] "filtering or selecting rows, it reorders them. It takes a data frame,"
#> [137] "and a set of column names (or more complicated expressions) to order by."
#> [138] "If you provide more than one column name, each additional column will be"
#> [139] "used to break ties in the values of preceding columns:"
#> [140] ""
#> [141] "::: {#cb4 .sourceCode}"
#> [142] "``` {.sourceCode .r}"
#> [143] "starwars %>% arrange(height, mass)"
#> [144] "#> # A tibble: 87 × 14"
#> [145] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [146] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [147] "#> 1 Yoda 66 17 white green brown 896 male mascu…"
#> [148] "#> 2 Ratts Ty… 79 15 none grey, blue unknown NA male mascu…"
#> [149] "#> 3 Wicket S… 88 20 brown brown brown 8 male mascu…"
#> [150] "#> 4 Dud Bolt 94 45 none blue, grey yellow NA male mascu…"
#> [151] "#> # ℹ 83 more rows"
#> [152] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [153] "#> # vehicles <list>, starships <list>"
#> [154] "```"
#> [155] ":::"
#> [156] ""
#> [157] "Use `desc()` to order a column in descending order:"
#> [158] ""
#> [159] "::: {#cb5 .sourceCode}"
#> [160] "``` {.sourceCode .r}"
#> [161] "starwars %>% arrange(desc(height))"
#> [162] "#> # A tibble: 87 × 14"
#> [163] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [164] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [165] "#> 1 Yarael P… 264 NA none white yellow NA male mascu…"
#> [166] "#> 2 Tarfful 234 136 brown brown blue NA male mascu…"
#> [167] "#> 3 Lama Su 229 88 none grey black NA male mascu…"
#> [168] "#> 4 Chewbacca 228 112 brown unknown blue 200 male mascu…"
#> [169] "#> # ℹ 83 more rows"
#> [170] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [171] "#> # vehicles <list>, starships <list>"
#> [172] "```"
#> [173] ":::"
#> [174] ":::"
#> [175] ""
#> [176] "::: {#choose-rows-using-their-position-with-slice .section .level3}"
#> [177] "### Choose rows using their position with `slice()`"
#> [178] ""
#> [179] "`slice()` lets you index rows by their (integer) locations. It allows"
#> [180] "you to select, remove, and duplicate rows."
#> [181] ""
#> [182] "We can get characters from row numbers 5 through 10."
#> [183] ""
#> [184] "::: {#cb6 .sourceCode}"
#> [185] "``` {.sourceCode .r}"
#> [186] "starwars %>% slice(5:10)"
#> [187] "#> # A tibble: 6 × 14"
#> [188] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [189] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [190] "#> 1 Leia Org… 150 49 brown light brown 19 fema… femin…"
#> [191] "#> 2 Owen Lars 178 120 brown, gr… light blue 52 male mascu…"
#> [192] "#> 3 Beru Whi… 165 75 brown light blue 47 fema… femin…"
#> [193] "#> 4 R5-D4 97 32 <NA> white, red red NA none mascu…"
#> [194] "#> # ℹ 2 more rows"
#> [195] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [196] "#> # vehicles <list>, starships <list>"
#> [197] "```"
#> [198] ":::"
#> [199] ""
#> [200] "It is accompanied by a number of helpers for common use cases:"
#> [201] ""
#> [202] "- `slice_head()` and `slice_tail()` select the first or last rows."
#> [203] ""
#> [204] "::: {#cb7 .sourceCode}"
#> [205] "``` {.sourceCode .r}"
#> [206] "starwars %>% slice_head(n = 3)"
#> [207] "#> # A tibble: 3 × 14"
#> [208] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [209] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [210] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [211] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [212] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [213] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [214] "#> # vehicles <list>, starships <list>"
#> [215] "```"
#> [216] ":::"
#> [217] ""
#> [218] "- `slice_sample()` randomly selects rows. Use the option prop to"
#> [219] " choose a certain proportion of the cases."
#> [220] ""
#> [221] "::: {#cb8 .sourceCode}"
#> [222] "``` {.sourceCode .r}"
#> [223] "starwars %>% slice_sample(n = 5)"
#> [224] "#> # A tibble: 5 × 14"
#> [225] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [226] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [227] "#> 1 Ayla Sec… 178 55 none blue hazel 48 fema… femin…"
#> [228] "#> 2 Bossk 190 113 none green red 53 male mascu…"
#> [229] "#> 3 San Hill 191 NA none grey gold NA male mascu…"
#> [230] "#> 4 Luminara… 170 56.2 black yellow blue 58 fema… femin…"
#> [231] "#> # ℹ 1 more row"
#> [232] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [233] "#> # vehicles <list>, starships <list>"
#> [234] "starwars %>% slice_sample(prop = 0.1)"
#> [235] "#> # A tibble: 8 × 14"
#> [236] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [237] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [238] "#> 1 Qui-Gon … 193 89 brown fair blue 92 male mascu…"
#> [239] "#> 2 Jango Fe… 183 79 black tan brown 66 male mascu…"
#> [240] "#> 3 Jocasta … 167 NA white fair blue NA fema… femin…"
#> [241] "#> 4 Zam Wese… 168 55 blonde fair, gre… yellow NA fema… femin…"
#> [242] "#> # ℹ 4 more rows"
#> [243] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [244] "#> # vehicles <list>, starships <list>"
#> [245] "```"
#> [246] ":::"
#> [247] ""
#> [248] "Use `replace = TRUE` to perform a bootstrap sample. If needed, you can"
#> [249] "weight the sample with the `weight` argument."
#> [250] ""
#> [251] "- `slice_min()` and `slice_max()` select rows with highest or lowest"
#> [252] " values of a variable. Note that we first must choose only the values"
#> [253] " which are not NA."
#> [254] ""
#> [255] "::: {#cb9 .sourceCode}"
#> [256] "``` {.sourceCode .r}"
#> [257] "starwars %>%"
#> [258] " filter(!is.na(height)) %>%"
#> [259] " slice_max(height, n = 3)"
#> [260] "#> # A tibble: 3 × 14"
#> [261] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [262] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [263] "#> 1 Yarael P… 264 NA none white yellow NA male mascu…"
#> [264] "#> 2 Tarfful 234 136 brown brown blue NA male mascu…"
#> [265] "#> 3 Lama Su 229 88 none grey black NA male mascu…"
#> [266] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [267] "#> # vehicles <list>, starships <list>"
#> [268] "```"
#> [269] ":::"
#> [270] ":::"
#> [271] ""
#> [272] "::: {#select-columns-with-select .section .level3}"
#> [273] "### Select columns with `select()`"
#> [274] ""
#> [275] "Often you work with large datasets with many columns but only a few are"
#> [276] "actually of interest to you. `select()` allows you to rapidly zoom in on"
#> [277] "a useful subset using operations that usually only work on numeric"
#> [278] "variable positions:"
#> [279] ""
#> [280] "::: {#cb10 .sourceCode}"
#> [281] "``` {.sourceCode .r}"
#> [282] "# Select columns by name"
#> [283] "starwars %>% select(hair_color, skin_color, eye_color)"
#> [284] "#> # A tibble: 87 × 3"
#> [285] "#> hair_color skin_color eye_color"
#> [286] "#> <chr> <chr> <chr> "
#> [287] "#> 1 blond fair blue "
#> [288] "#> 2 <NA> gold yellow "
#> [289] "#> 3 <NA> white, blue red "
#> [290] "#> 4 none white yellow "
#> [291] "#> # ℹ 83 more rows"
#> [292] "# Select all columns between hair_color and eye_color (inclusive)"
#> [293] "starwars %>% select(hair_color:eye_color)"
#> [294] "#> # A tibble: 87 × 3"
#> [295] "#> hair_color skin_color eye_color"
#> [296] "#> <chr> <chr> <chr> "
#> [297] "#> 1 blond fair blue "
#> [298] "#> 2 <NA> gold yellow "
#> [299] "#> 3 <NA> white, blue red "
#> [300] "#> 4 none white yellow "
#> [301] "#> # ℹ 83 more rows"
#> [302] "# Select all columns except those from hair_color to eye_color (inclusive)"
#> [303] "starwars %>% select(!(hair_color:eye_color))"
#> [304] "#> # A tibble: 87 × 11"
#> [305] "#> name height mass birth_year sex gender homeworld species films vehicles"
#> [306] "#> <chr> <int> <dbl> <dbl> <chr> <chr> <chr> <chr> <lis> <list> "
#> [307] "#> 1 Luke Sk… 172 77 19 male mascu… Tatooine Human <chr> <chr> "
#> [308] "#> 2 C-3PO 167 75 112 none mascu… Tatooine Droid <chr> <chr> "
#> [309] "#> 3 R2-D2 96 32 33 none mascu… Naboo Droid <chr> <chr> "
#> [310] "#> 4 Darth V… 202 136 41.9 male mascu… Tatooine Human <chr> <chr> "
#> [311] "#> # ℹ 83 more rows"
#> [312] "#> # ℹ 1 more variable: starships <list>"
#> [313] "# Select all columns ending with color"
#> [314] "starwars %>% select(ends_with(\"color\"))"
#> [315] "#> # A tibble: 87 × 3"
#> [316] "#> hair_color skin_color eye_color"
#> [317] "#> <chr> <chr> <chr> "
#> [318] "#> 1 blond fair blue "
#> [319] "#> 2 <NA> gold yellow "
#> [320] "#> 3 <NA> white, blue red "
#> [321] "#> 4 none white yellow "
#> [322] "#> # ℹ 83 more rows"
#> [323] "```"
#> [324] ":::"
#> [325] ""
#> [326] "There are a number of helper functions you can use within `select()`,"
#> [327] "like `starts_with()`, `ends_with()`, `matches()` and `contains()`. These"
#> [328] "let you quickly match larger blocks of variables that meet some"
#> [329] "criterion. See `?select` for more details."
#> [330] ""
#> [331] "You can rename variables with `select()` by using named arguments:"
#> [332] ""
#> [333] "::: {#cb11 .sourceCode}"
#> [334] "``` {.sourceCode .r}"
#> [335] "starwars %>% select(home_world = homeworld)"
#> [336] "#> # A tibble: 87 × 1"
#> [337] "#> home_world"
#> [338] "#> <chr> "
#> [339] "#> 1 Tatooine "
#> [340] "#> 2 Tatooine "
#> [341] "#> 3 Naboo "
#> [342] "#> 4 Tatooine "
#> [343] "#> # ℹ 83 more rows"
#> [344] "```"
#> [345] ":::"
#> [346] ""
#> [347] "But because `select()` drops all the variables not explicitly mentioned,"
#> [348] "it's not that useful. Instead, use `rename()`:"
#> [349] ""
#> [350] "::: {#cb12 .sourceCode}"
#> [351] "``` {.sourceCode .r}"
#> [352] "starwars %>% rename(home_world = homeworld)"
#> [353] "#> # A tibble: 87 × 14"
#> [354] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [355] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [356] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [357] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [358] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [359] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [360] "#> # ℹ 83 more rows"
#> [361] "#> # ℹ 5 more variables: home_world <chr>, species <chr>, films <list>,"
#> [362] "#> # vehicles <list>, starships <list>"
#> [363] "```"
#> [364] ":::"
#> [365] ":::"
#> [366] ""
#> [367] "::: {#add-new-columns-with-mutate .section .level3}"
#> [368] "### Add new columns with `mutate()`"
#> [369] ""
#> [370] "Besides selecting sets of existing columns, it's often useful to add new"
#> [371] "columns that are functions of existing columns. This is the job of"
#> [372] "`mutate()`:"
#> [373] ""
#> [374] "::: {#cb13 .sourceCode}"
#> [375] "``` {.sourceCode .r}"
#> [376] "starwars %>% mutate(height_m = height / 100)"
#> [377] "#> # A tibble: 87 × 15"
#> [378] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [379] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [380] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [381] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [382] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [383] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [384] "#> # ℹ 83 more rows"
#> [385] "#> # ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [386] "#> # vehicles <list>, starships <list>, height_m <dbl>"
#> [387] "```"
#> [388] ":::"
#> [389] ""
#> [390] "We can't see the height in meters we just calculated, but we can fix"
#> [391] "that using a select command."
#> [392] ""
#> [393] "::: {#cb14 .sourceCode}"
#> [394] "``` {.sourceCode .r}"
#> [395] "starwars %>%"
#> [396] " mutate(height_m = height / 100) %>%"
#> [397] " select(height_m, height, everything())"
#> [398] "#> # A tibble: 87 × 15"
#> [399] "#> height_m height name mass hair_color skin_color eye_color birth_year sex "
#> [400] "#> <dbl> <int> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr>"
#> [401] "#> 1 1.72 172 Luke S… 77 blond fair blue 19 male "
#> [402] "#> 2 1.67 167 C-3PO 75 <NA> gold yellow 112 none "
#> [403] "#> 3 0.96 96 R2-D2 32 <NA> white, bl… red 33 none "
#> [404] "#> 4 2.02 202 Darth … 136 none white yellow 41.9 male "
#> [405] "#> # ℹ 83 more rows"
#> [406] "#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,"
#> [407] "#> # films <list>, vehicles <list>, starships <list>"
#> [408] "```"
#> [409] ":::"
#> [410] ""
#> [411] "`dplyr::mutate()` is similar to the base `transform()`, but allows you"
#> [412] "to refer to columns that you've just created:"
#> [413] ""
#> [414] "::: {#cb15 .sourceCode}"
#> [415] "``` {.sourceCode .r}"
#> [416] "starwars %>%"
#> [417] " mutate("
#> [418] " height_m = height / 100,"
#> [419] " BMI = mass / (height_m^2)"
#> [420] " ) %>%"
#> [421] " select(BMI, everything())"
#> [422] "#> # A tibble: 87 × 16"
#> [423] "#> BMI name height mass hair_color skin_color eye_color birth_year sex "
#> [424] "#> <dbl> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>"
#> [425] "#> 1 26.0 Luke Skyw… 172 77 blond fair blue 19 male "
#> [426] "#> 2 26.9 C-3PO 167 75 <NA> gold yellow 112 none "
#> [427] "#> 3 34.7 R2-D2 96 32 <NA> white, bl… red 33 none "
#> [428] "#> 4 33.3 Darth Vad… 202 136 none white yellow 41.9 male "
#> [429] "#> # ℹ 83 more rows"
#> [430] "#> # ℹ 7 more variables: gender <chr>, homeworld <chr>, species <chr>,"
#> [431] "#> # films <list>, vehicles <list>, starships <list>, height_m <dbl>"
#> [432] "```"
#> [433] ":::"
#> [434] ""
#> [435] "If you only want to keep the new variables, use `.keep = \"none\"`:"
#> [436] ""
#> [437] "::: {#cb16 .sourceCode}"
#> [438] "``` {.sourceCode .r}"
#> [439] "starwars %>%"
#> [440] " mutate("
#> [441] " height_m = height / 100,"
#> [442] " BMI = mass / (height_m^2),"
#> [443] " .keep = \"none\""
#> [444] " )"
#> [445] "#> # A tibble: 87 × 2"
#> [446] "#> height_m BMI"
#> [447] "#> <dbl> <dbl>"
#> [448] "#> 1 1.72 26.0"
#> [449] "#> 2 1.67 26.9"
#> [450] "#> 3 0.96 34.7"
#> [451] "#> 4 2.02 33.3"
#> [452] "#> # ℹ 83 more rows"
#> [453] "```"
#> [454] ":::"
#> [455] ":::"
#> [456] ""
#> [457] "::: {#change-column-order-with-relocate .section .level3}"
#> [458] "### Change column order with `relocate()`"
#> [459] ""
#> [460] "Use a similar syntax as `select()` to move blocks of columns at once"
#> [461] ""
#> [462] "::: {#cb17 .sourceCode}"
#> [463] "``` {.sourceCode .r}"
#> [464] "starwars %>% relocate(sex:homeworld, .before = height)"
#> [465] "#> # A tibble: 87 × 14"
#> [466] "#> name sex gender homeworld height mass hair_color skin_color eye_color"
#> [467] "#> <chr> <chr> <chr> <chr> <int> <dbl> <chr> <chr> <chr> "
#> [468] "#> 1 Luke Skyw… male mascu… Tatooine 172 77 blond fair blue "
#> [469] "#> 2 C-3PO none mascu… Tatooine 167 75 <NA> gold yellow "
#> [470] "#> 3 R2-D2 none mascu… Naboo 96 32 <NA> white, bl… red "
#> [471] "#> 4 Darth Vad… male mascu… Tatooine 202 136 none white yellow "
#> [472] "#> # ℹ 83 more rows"
#> [473] "#> # ℹ 5 more variables: birth_year <dbl>, species <chr>, films <list>,"
#> [474] "#> # vehicles <list>, starships <list>"
#> [475] "```"
#> [476] ":::"
#> [477] ":::"
#> [478] ""
#> [479] "::: {#summarise-values-with-summarise .section .level3}"
#> [480] "### Summarise values with `summarise()`"
#> [481] ""
#> [482] "The last verb is `summarise()`. It collapses a data frame to a single"
#> [483] "row."
#> [484] ""
#> [485] "::: {#cb18 .sourceCode}"
#> [486] "``` {.sourceCode .r}"
#> [487] "starwars %>% summarise(height = mean(height, na.rm = TRUE))"
#> [488] "#> # A tibble: 1 × 1"
#> [489] "#> height"
#> [490] "#> <dbl>"
#> [491] "#> 1 175."
#> [492] "```"
#> [493] ":::"
#> [494] ""
#> [495] "It's not that useful until we learn the `group_by()` verb below."
#> [496] ":::"
#> [497] ""
#> [498] "::: {#commonalities .section .level3}"
#> [499] "### Commonalities"
#> [500] ""
#> [501] "You may have noticed that the syntax and function of all these verbs are"
#> [502] "very similar:"
#> [503] ""
#> [504] "- The first argument is a data frame."
#> [505] ""
#> [506] "- The subsequent arguments describe what to do with the data frame."
#> [507] " You can refer to columns in the data frame directly without using"
#> [508] " `$`."
#> [509] ""
#> [510] "- The result is a new data frame"
#> [511] ""
#> [512] "Together these properties make it easy to chain together multiple simple"
#> [513] "steps to achieve a complex result."
#> [514] ""
#> [515] "These five functions provide the basis of a language of data"
#> [516] "manipulation. At the most basic level, you can only alter a tidy data"
#> [517] "frame in five useful ways: you can reorder the rows (`arrange()`), pick"
#> [518] "observations and variables of interest (`filter()` and `select()`), add"
#> [519] "new variables that are functions of existing variables (`mutate()`), or"
#> [520] "collapse many values to a summary (`summarise()`)."
#> [521] ":::"
#> [522] ":::"
#> [523] ""
#> [524] "::: {#combining-functions-with .section .level2}"
#> [525] "## Combining functions with `%>%`"
#> [526] ""
#> [527] "The dplyr API is functional in the sense that function calls don't have"
#> [528] "side-effects. You must always save their results. This doesn't lead to"
#> [529] "particularly elegant code, especially if you want to do many operations"
#> [530] "at once. You either have to do it step-by-step:"
#> [531] ""
#> [532] "::: {#cb19 .sourceCode}"
#> [533] "``` {.sourceCode .r}"
#> [534] "a1 <- group_by(starwars, species, sex)"
#> [535] "a2 <- select(a1, height, mass)"
#> [536] "a3 <- summarise(a2,"
#> [537] " height = mean(height, na.rm = TRUE),"
#> [538] " mass = mean(mass, na.rm = TRUE)"
#> [539] ")"
#> [540] "```"
#> [541] ":::"
#> [542] ""
#> [543] "Or if you don't want to name the intermediate results, you need to wrap"
#> [544] "the function calls inside each other:"
#> [545] ""
#> [546] "::: {#cb20 .sourceCode}"
#> [547] "``` {.sourceCode .r}"
#> [548] "summarise("
#> [549] " select("
#> [550] " group_by(starwars, species, sex),"
#> [551] " height, mass"
#> [552] " ),"
#> [553] " height = mean(height, na.rm = TRUE),"
#> [554] " mass = mean(mass, na.rm = TRUE)"
#> [555] ")"
#> [556] "#> Adding missing grouping variables: `species`, `sex`"
#> [557] "#> `summarise()` has grouped output by 'species'. You can override using the"
#> [558] "#> `.groups` argument."
#> [559] "#> # A tibble: 41 × 4"
#> [560] "#> # Groups: species [38]"
#> [561] "#> species sex height mass"
#> [562] "#> <chr> <chr> <dbl> <dbl>"
#> [563] "#> 1 Aleena male 79 15"
#> [564] "#> 2 Besalisk male 198 102"
#> [565] "#> 3 Cerean male 198 82"
#> [566] "#> 4 Chagrian male 196 NaN"
#> [567] "#> # ℹ 37 more rows"
#> [568] "```"
#> [569] ":::"
#> [570] ""
#> [571] "This is difficult to read because the order of the operations is from"
#> [572] "inside to out. Thus, the arguments are a long way away from the"
#> [573] "function. To get around this problem, dplyr provides the `%>%` operator"
#> [574] "from magrittr. `x %>% f(y)` turns into `f(x, y)` so you can use it to"
#> [575] "rewrite multiple operations that you can read left-to-right,"
#> [576] "top-to-bottom (reading the pipe operator as \"then\"):"
#> [577] ""
#> [578] "::: {#cb21 .sourceCode}"
#> [579] "``` {.sourceCode .r}"
#> [580] "starwars %>%"
#> [581] " group_by(species, sex) %>%"
#> [582] " select(height, mass) %>%"
#> [583] " summarise("
#> [584] " height = mean(height, na.rm = TRUE),"
#> [585] " mass = mean(mass, na.rm = TRUE)"
#> [586] " )"
#> [587] "```"
#> [588] ":::"
#> [589] ":::"
#> [590] ""
#> [591] "::: {#patterns-of-operations .section .level2}"
#> [592] "## Patterns of operations"
#> [593] ""
#> [594] "The dplyr verbs can be classified by the type of operations they"
#> [595] "accomplish (we sometimes speak of their **semantics**, i.e., their"
#> [596] "meaning). It's helpful to have a good grasp of the difference between"
#> [597] "select and mutate operations."
#> [598] ""
#> [599] "::: {#selecting-operations .section .level3}"
#> [600] "### Selecting operations"
#> [601] ""
#> [602] "One of the appealing features of dplyr is that you can refer to columns"
#> [603] "from the tibble as if they were regular variables. However, the"
#> [604] "syntactic uniformity of referring to bare column names hides semantical"
#> [605] "differences across the verbs. A column symbol supplied to `select()`"
#> [606] "does not have the same meaning as the same symbol supplied to"
#> [607] "`mutate()`."
#> [608] ""
#> [609] "Selecting operations expect column names and positions. Hence, when you"
#> [610] "call `select()` with bare variable names, they actually represent their"
#> [611] "own positions in the tibble. The following calls are completely"
#> [612] "equivalent from dplyr's point of view:"
#> [613] ""
#> [614] "::: {#cb22 .sourceCode}"
#> [615] "``` {.sourceCode .r}"
#> [616] "# `name` represents the integer 1"
#> [617] "select(starwars, name)"
#> [618] "#> # A tibble: 87 × 1"
#> [619] "#> name "
#> [620] "#> <chr> "
#> [621] "#> 1 Luke Skywalker"
#> [622] "#> 2 C-3PO "
#> [623] "#> 3 R2-D2 "
#> [624] "#> 4 Darth Vader "
#> [625] "#> # ℹ 83 more rows"
#> [626] "select(starwars, 1)"
#> [627] "#> # A tibble: 87 × 1"
#> [628] "#> name "
#> [629] "#> <chr> "
#> [630] "#> 1 Luke Skywalker"
#> [631] "#> 2 C-3PO "
#> [632] "#> 3 R2-D2 "
#> [633] "#> 4 Darth Vader "
#> [634] "#> # ℹ 83 more rows"
#> [635] "```"
#> [636] ":::"
#> [637] ""
#> [638] "By the same token, this means that you cannot refer to variables from"
#> [639] "the surrounding context if they have the same name as one of the"
#> [640] "columns. In the following example, `height` still represents 2, not 5:"
#> [641] ""
#> [642] "::: {#cb23 .sourceCode}"
#> [643] "``` {.sourceCode .r}"
#> [644] "height <- 5"
#> [645] "select(starwars, height)"
#> [646] "#> # A tibble: 87 × 1"
#> [647] "#> height"
#> [648] "#> <int>"
#> [649] "#> 1 172"
#> [650] "#> 2 167"
#> [651] "#> 3 96"
#> [652] "#> 4 202"
#> [653] "#> # ℹ 83 more rows"
#> [654] "```"
#> [655] ":::"
#> [656] ""
#> [657] "One useful subtlety is that this only applies to bare names and to"
#> [658] "selecting calls like `c(height, mass)` or `height:mass`. In all other"
#> [659] "cases, the columns of the data frame are not put in scope. This allows"
#> [660] "you to refer to contextual variables in selection helpers:"
#> [661] ""
#> [662] "::: {#cb24 .sourceCode}"
#> [663] "``` {.sourceCode .r}"
#> [664] "name <- \"color\""
#> [665] "select(starwars, ends_with(name))"
#> [666] "#> # A tibble: 87 × 3"
#> [667] "#> hair_color skin_color eye_color"
#> [668] "#> <chr> <chr> <chr> "
#> [669] "#> 1 blond fair blue "
#> [670] "#> 2 <NA> gold yellow "
#> [671] "#> 3 <NA> white, blue red "
#> [672] "#> 4 none white yellow "
#> [673] "#> # ℹ 83 more rows"
#> [674] "```"
#> [675] ":::"
#> [676] ""
#> [677] "These semantics are usually intuitive. But note the subtle difference:"
#> [678] ""
#> [679] "::: {#cb25 .sourceCode}"
#> [680] "``` {.sourceCode .r}"
#> [681] "name <- 5"
#> [682] "select(starwars, name, identity(name))"
#> [683] "#> # A tibble: 87 × 2"
#> [684] "#> name skin_color "
#> [685] "#> <chr> <chr> "
#> [686] "#> 1 Luke Skywalker fair "
#> [687] "#> 2 C-3PO gold "
#> [688] "#> 3 R2-D2 white, blue"
#> [689] "#> 4 Darth Vader white "
#> [690] "#> # ℹ 83 more rows"
#> [691] "```"
#> [692] ":::"
#> [693] ""
#> [694] "In the first argument, `name` represents its own position `1`. In the"
#> [695] "second argument, `name` is evaluated in the surrounding context and"
#> [696] "represents the fifth column."
#> [697] ""
#> [698] "For a long time, `select()` used to only understand column positions."
#> [699] "Counting from dplyr 0.6, it now understands column names as well. This"
#> [700] "makes it a bit easier to program with `select()`:"
#> [701] ""
#> [702] "::: {#cb26 .sourceCode}"
#> [703] "``` {.sourceCode .r}"
#> [704] "vars <- c(\"name\", \"height\")"
#> [705] "select(starwars, all_of(vars), \"mass\")"
#> [706] "#> # A tibble: 87 × 3"
#> [707] "#> name height mass"
#> [708] "#> <chr> <int> <dbl>"
#> [709] "#> 1 Luke Skywalker 172 77"
#> [710] "#> 2 C-3PO 167 75"
#> [711] "#> 3 R2-D2 96 32"
#> [712] "#> 4 Darth Vader 202 136"
#> [713] "#> # ℹ 83 more rows"
#> [714] "```"
#> [715] ":::"
#> [716] ":::"
#> [717] ""
#> [718] "::: {#mutating-operations .section .level3}"
#> [719] "### Mutating operations"
#> [720] ""
#> [721] "Mutate semantics are quite different from selection semantics. Whereas"
#> [722] "`select()` expects column names or positions, `mutate()` expects *column"
#> [723] "vectors*. We will set up a smaller tibble to use for our examples."
#> [724] ""
#> [725] "::: {#cb27 .sourceCode}"
#> [726] "``` {.sourceCode .r}"
#> [727] "df <- starwars %>% select(name, height, mass)"
#> [728] "```"
#> [729] ":::"
#> [730] ""
#> [731] "When we use `select()`, the bare column names stand for their own"
#> [732] "positions in the tibble. For `mutate()` on the other hand, column"
#> [733] "symbols represent the actual column vectors stored in the tibble."
#> [734] "Consider what happens if we give a string or a number to `mutate()`:"
#> [735] ""
#> [736] "::: {#cb28 .sourceCode}"
#> [737] "``` {.sourceCode .r}"
#> [738] "mutate(df, \"height\", 2)"
#> [739] "#> # A tibble: 87 × 5"
#> [740] "#> name height mass `\"height\"` `2`"
#> [741] "#> <chr> <int> <dbl> <chr> <dbl>"
#> [742] "#> 1 Luke Skywalker 172 77 height 2"
#> [743] "#> 2 C-3PO 167 75 height 2"
#> [744] "#> 3 R2-D2 96 32 height 2"
#> [745] "#> 4 Darth Vader 202 136 height 2"
#> [746] "#> # ℹ 83 more rows"
#> [747] "```"
#> [748] ":::"
#> [749] ""
#> [750] "`mutate()` gets length-1 vectors that it interprets as new columns in"
#> [751] "the data frame. These vectors are recycled so they match the number of"
#> [752] "rows. That's why it doesn't make sense to supply expressions like"
#> [753] "`\"height\" + 10` to `mutate()`. This amounts to adding 10 to a string!"
#> [754] "The correct expression is:"
#> [755] ""
#> [756] "::: {#cb29 .sourceCode}"
#> [757] "``` {.sourceCode .r}"
#> [758] "mutate(df, height + 10)"
#> [759] "#> # A tibble: 87 × 4"
#> [760] "#> name height mass `height + 10`"
#> [761] "#> <chr> <int> <dbl> <dbl>"
#> [762] "#> 1 Luke Skywalker 172 77 182"
#> [763] "#> 2 C-3PO 167 75 177"
#> [764] "#> 3 R2-D2 96 32 106"
#> [765] "#> 4 Darth Vader 202 136 212"
#> [766] "#> # ℹ 83 more rows"
#> [767] "```"
#> [768] ":::"
#> [769] ""
#> [770] "In the same way, you can unquote values from the context if these values"
#> [771] "represent a valid column. They must be either length 1 (they then get"
#> [772] "recycled) or have the same length as the number of rows. In the"
#> [773] "following example we create a new vector that we add to the data frame:"
#> [774] ""
#> [775] "::: {#cb30 .sourceCode}"
#> [776] "``` {.sourceCode .r}"
#> [777] "var <- seq(1, nrow(df))"
#> [778] "mutate(df, new = var)"
#> [779] "#> # A tibble: 87 × 4"
#> [780] "#> name height mass new"
#> [781] "#> <chr> <int> <dbl> <int>"
#> [782] "#> 1 Luke Skywalker 172 77 1"
#> [783] "#> 2 C-3PO 167 75 2"
#> [784] "#> 3 R2-D2 96 32 3"
#> [785] "#> 4 Darth Vader 202 136 4"
#> [786] "#> # ℹ 83 more rows"
#> [787] "```"
#> [788] ":::"
#> [789] ""
#> [790] "A case in point is `group_by()`. While you might think it has select"
#> [791] "semantics, it actually has mutate semantics. This is quite handy as it"
#> [792] "allows to group by a modified column:"
#> [793] ""
#> [794] "::: {#cb31 .sourceCode}"
#> [795] "``` {.sourceCode .r}"
#> [796] "group_by(starwars, sex)"
#> [797] "#> # A tibble: 87 × 14"
#> [798] "#> # Groups: sex [5]"
#> [799] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [800] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [801] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [802] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [803] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [804] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [805] "#> # ℹ 83 more rows"
#> [806] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [807] "#> # vehicles <list>, starships <list>"
#> [808] "group_by(starwars, sex = as.factor(sex))"
#> [809] "#> # A tibble: 87 × 14"
#> [810] "#> # Groups: sex [5]"
#> [811] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [812] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <fct> <chr> "
#> [813] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [814] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [815] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [816] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [817] "#> # ℹ 83 more rows"
#> [818] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [819] "#> # vehicles <list>, starships <list>"
#> [820] "group_by(starwars, height_binned = cut(height, 3))"
#> [821] "#> # A tibble: 87 × 15"
#> [822] "#> # Groups: height_binned [4]"
#> [823] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [824] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [825] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [826] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [827] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [828] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [829] "#> # ℹ 83 more rows"
#> [830] "#> # ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [831] "#> # vehicles <list>, starships <list>, height_binned <fct>"
#> [832] "```"
#> [833] ":::"
#> [834] ""
#> [835] "This is why you can't supply a column name to `group_by()`. This amounts"
#> [836] "to creating a new column containing the string recycled to the number of"
#> [837] "rows:"
#> [838] ""
#> [839] "::: {#cb32 .sourceCode}"
#> [840] "``` {.sourceCode .r}"
#> [841] "group_by(df, \"month\")"
#> [842] "#> # A tibble: 87 × 4"
#> [843] "#> # Groups: \"month\" [1]"
#> [844] "#> name height mass `\"month\"`"
#> [845] "#> <chr> <int> <dbl> <chr> "
#> [846] "#> 1 Luke Skywalker 172 77 month "
#> [847] "#> 2 C-3PO 167 75 month "
#> [848] "#> 3 R2-D2 96 32 month "
#> [849] "#> 4 Darth Vader 202 136 month "
#> [850] "#> # ℹ 83 more rows"
#> [851] "```"
#> [852] ":::"
#> [853] ":::"
#> [854] ":::"
# Files ----
btw_this("./") # list files in the current working directory
#> [1] "| path | type | size | modification_time |\n|------|------|------|-------------------|\n| btw-package.html | file | 7.02K | 2025-05-13 03:01:05 |\n| btw.html | file | 14.49K | 2025-05-13 03:01:06 |\n| btw_client.html | file | 18.95K | 2025-05-13 03:01:06 |\n| index.html | file | 8.27K | 2025-05-13 03:01:05 |"